{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reddit aitextgen\n",
    "\n",
    "A demo on how aitextgen can be used to create bespoke Reddit submission titles.\n",
    "\n",
    "**WARNING**: The content of the output may be NSFW!\n",
    "\n",
    "**NOTE**: This is released as a proof of concept for mini-GPT-2 models; quality of titles may vary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aitextgen import aitextgen"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the Reddit Model\n",
    "\n",
    "The `minimaxir/reddit` model was finetuned on Reddit submissions up until August 2019, from the top 1,000 posts for each of the [top 1,000 subreddits](https://docs.google.com/spreadsheets/d/1zInLaR3daOC3N2ZBkudG5ypgJSaEwXqHRo9VazPFq-w/edit?usp=sharing) by unique submitters (an equal number of posts from each subreddit is important to prevent sampling bias).\n",
    "\n",
    "It uses a custom GPT-2 architecture that is only 30 MB on disk (compared to 124M GPT-2's 500MB on disk.)\n",
    "\n",
    "Running the cell will download the model and cache it into `/aitextgen`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:aitextgen:Loading minimaxir/reddit model from /aitextgen.\n",
      "INFO:aitextgen:Using the tokenizer for minimaxir/reddit.\n"
     ]
    }
   ],
   "source": [
    "ai = aitextgen(model=\"minimaxir/reddit\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generation\n",
    "\n",
    "Since the model is so small, generation happens almost immediately, even in bulk."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "halo Damn...\n"
     ]
    }
   ],
   "source": [
    "ai.generate()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tifu TIFU by having a boner at my girlfriend's house\n",
      "==========\n",
      "television The Amazon Prime Minister's New Year, and a General Library Is Family With 'Realistic'\n",
      "==========\n",
      "summonerswar My friends and I are getting a new new tier list!\n",
      "==========\n",
      "congratslikeimfive I've been born and it's been an hour since I was 6.\n",
      "==========\n",
      "cakeday I'm still a bit inexperienced with it.\n"
     ]
    }
   ],
   "source": [
    "ai.generate(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prompted Input\n",
    "\n",
    "You can seed input with a `prompt` to get specific types of Reddit posts. The prompt will be **bolded** in the output.\n",
    "\n",
    "Since the lowercase name of the subreddit is always first, you can use that in a prompt as a control code to (roughly) ensure output is from that subreddit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1maskreddit What\u001b[0m’s your favourite most unique book you’ve ever read?\n",
      "==========\n",
      "\u001b[1maskreddit What\u001b[0m's the biggest difference between a pedophile and a child?\n",
      "==========\n",
      "\u001b[1maskreddit What\u001b[0m's a way to avoid a risk of your life?\n",
      "==========\n",
      "\u001b[1maskreddit What\u001b[0m's the most horrible thing you've ever done for your SO?\n",
      "==========\n",
      "\u001b[1maskreddit What\u001b[0m’s your most illegal/vocal/miscarriage/sort of Reddit?\n"
     ]
    }
   ],
   "source": [
    "ai.generate(5, prompt=\"askreddit What\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1mgaming The Witcher 3\u001b[0m will be the first game in a row when it's in a game\n",
      "==========\n",
      "\u001b[1mgaming The Witcher 3\u001b[0m is looking for an end game\n",
      "==========\n",
      "\u001b[1mgaming The Witcher 3\u001b[0m is coming.\n",
      "==========\n",
      "\u001b[1mgaming The Witcher 3\u001b[0m\n",
      "==========\n",
      "\u001b[1mgaming The Witcher 3\u001b[0m is now in the new DS3\n"
     ]
    }
   ],
   "source": [
    "ai.generate(5, prompt=\"gaming The Witcher 3\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1mrelationships My parents\u001b[0m [25/M] made me [27/F] with a \"mother\". The hospital has finally done it. I'm starting to believe it's been a long day today.\n",
      "==========\n",
      "\u001b[1mrelationships My parents\u001b[0m (26F) told me that my boyfriend (27M) of 4 years is pregnant, and I have a long long long partner and have no idea how to afford it.\n",
      "==========\n",
      "\u001b[1mrelationships My parents\u001b[0m [28M] wants me [27F] that I'm not sure how to respond to my neighbor's [21F] relationship with my girlfriend [31F] of 3 years. Should I be worried about her?\n",
      "==========\n",
      "\u001b[1mrelationships My parents\u001b[0m [33F] broke my boyfriend [28F] with my best friend [26F] because I don't want to be a gentleman. What do I do?\n",
      "==========\n",
      "\u001b[1mrelationships My parents\u001b[0m [27F] cheated on me (27F) with my girlfriend (26M) and my husband (22F) don't know how to deal with her boyfriend (27M)\n"
     ]
    }
   ],
   "source": [
    "ai.generate(5, prompt=\"relationships My parents\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1mamitheasshole AITA\u001b[0m for telling my son to be a parent because she was a kid in middle school?\n",
      "==========\n",
      "\u001b[1mamitheasshole AITA\u001b[0m for telling my son she was being an adult?\n",
      "==========\n",
      "\u001b[1mamitheasshole AITA\u001b[0m for not telling my wife that she is pregnant?\n",
      "==========\n",
      "\u001b[1mamitheasshole AITA\u001b[0m for telling my wife that I’m not pregnant if it’s my best friend?\n",
      "==========\n",
      "\u001b[1mamitheasshole AITA\u001b[0m for not giving up my child over my brother's wedding?\n"
     ]
    }
   ],
   "source": [
    "ai.generate(5, prompt=\"amitheasshole AITA\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bulk Generation to File\n",
    "\n",
    "You can use `generate_to_file()` to create many Reddit titles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:aitextgen:Generating 1,000 texts to ATG_20200518_001105_30355161.txt\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "63e3b99f783c4457a9342714ff009963",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, max=1000.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "ai.generate_to_file(1000, batch_size=20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MIT License\n",
    "\n",
    "Copyright (c) 2020 Max Woolf\n",
    "\n",
    "Permission is hereby granted, free of charge, to any person obtaining a copy\n",
    "of this software and associated documentation files (the \"Software\"), to deal\n",
    "in the Software without restriction, including without limitation the rights\n",
    "to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n",
    "copies of the Software, and to permit persons to whom the Software is\n",
    "furnished to do so, subject to the following conditions:\n",
    "\n",
    "The above copyright notice and this permission notice shall be included in all\n",
    "copies or substantial portions of the Software.\n",
    "\n",
    "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n",
    "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n",
    "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n",
    "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n",
    "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n",
    "OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n",
    "SOFTWARE."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.5 64-bit",
   "language": "python",
   "name": "python37564bitb9ff4e3157b244a896f88d1e5f3eb324"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}